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A' 

Newton-type  methods  and  quasi- Newton  methods  hare  proven  to  be  very 
successful  in  solving  dense  unconstrained  optimisation  problems.  Recently  there 
has  been  considerable  interest  in  extending  these  methods  to  solving  large  prob¬ 
lems  when  the  Hessian  matrix  has  a  known  a  priori  sparsity  pattern.  This  paper 
treats  sparse  quasi-Newton  methods  in  a  uniform  fashion  and  shows  the  effect  of 
loss  of  positive-definiteness  in  generating  updates.  These  sparse  quasi-Newton 
methods  coupled  with  a  modified  Cholesky  factorisation  to  take  into  account 
the  loss  of  positive-definiteness  when  solving  the  linear  systems  associated  with 
these  methods  were  tested  on  a  large  set  of  problems.  The  overall  conclusions 
are  that  these  methods  perform  poorly  in  general  —  the  Hessian  matrix  becomes 
indefinite  even  close  to  the  solution  and  superlinear  convergence  is  not  observed 
in  practice. 
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OPTIMIZATION  OF  UNCONSTRAINED  F UNCTION8  WITH 
SPARSE  HESSIAN  MATRICES  —  QUASI-NEWTON  METHODS 


1.  Introduction 

The  problem  of  concern  in  this  paper  is  the  unconstrained  minimisation  of 
a  twice-continnously  differentiable  function 

minimise  /(*).  (1.1) 

c£R" 

We  shall  consider  the  class  of  quasi-Newton  methods  applied  to  problem  (1.1) 
when  the  Hessian  matrix  of  the  function  /(*)  has  a  known  a  priori  sparsity 
pattern.  The  first  quasi-Newton  method  was  suggested  by  Davidon  (1959)  and 
extended  by  Fletcher  and  Powell  (1963).  Since  then  there  has  been  an  explosion 
of  interest  in  these  methods.  (For  a  comprehensive  survey  of  quasi-Newton 
methods  see  Dennis  and  Mori,  1977). 

The  idea  behind  the  most  popular  quasi-Newton  methods  for  (1.1)  is  to 
maintain  a  positive-definite  symmetric  matrix  that  approximates  the  Hessian 
matrix  of  f(x).  If  we  let  z*  denote  the  Jfcth  iterate,  a  quasi-Newton  algorithm 
obtains  a  descent  direction  p*  by  solving  the  system  of  equations 

BkPk  =  ~9k,  (1.2) 

where  5*  is  an  approximation  to  the  Hessian  matrix  at  iteration  k  and  gk  is 
the  gradient  of  /  at  z*.  If  Bk  is  positive-definite,  pk  is  guaranteed  to  be  a 
descent  direction.  Once  having  obtained  Pk,  the  new  point  Zk+i  is  given  by  z*-f 
a*Pk.  where  a*  >  0  is  chosen  so  that  f{zk+i)  is  “sufficiently”  less  than  /(z*) 
(for  a  precise  definition  of  this  term,  see,  for  example,  Ortega  and  Rheinboldt, 
1970).  If  the  new  point  *k+\  does  not  satisfy  the  convergence  criteria,  a  new 
approximation  to  the  Hessian  matrix  2?*+i  is  defined  by 

=  <PkBk  +  Uk,  (1.3) 

where  <Pk  is  a  scalar,  and  Uk  is  a  matrix  chosen  so  that  Bk+i  is  symmetric, 
positive-definite  and  satisfies  the  quasi-Newton  condition 

Bk+i*k  =  Vk,  (1A) 


with 

=  *fc+i  —  *ad  yk  =  9k+\  —  9k  • 

Usually,  Uk  is  a  matrix  of  low  rank.  The  most  popular  update  is  the  BFGS 
(Brqyden-Fletcher-Goldfarb-Shanno)  update  (see,  e.g.  Dennis  and  Mori,  1977), 
in  which 


a 
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<Pk  =  1 

m  _  W*  _  Bk»k»IBk 

~%V~k  *kBk*k 


(1.5) 


To  guarantee  that  the  updated  matrices  are  positive  definite,  we  require  that 
sfy*  >  0  (which  can  be  ensured  by  a  suitable  choice  of  a*). 

Quasi-Newton  methods  have  been  very  successful  in  solving  unconstrained 
and  constrained  optimisation  problems  of  moderate  sise.  The  difficulty  in  ap¬ 
plying  these  methods  as  described  above  to  large  problems  is  that  a  symmetric 
n  X  n  matrix  (or  a  factorisation  of  a  matrix)  must  be  stored.  However,  many 
large  problems  have  a  sparse  Hessian  matrix  whose  sparsity  pattern  is  known  (or 
can  be  determined)  a  priori.  In  these  cases  it  would  be  possible  to  maintain  a 
suitably  sparse  approximation  to  the  Hessian  matrix.  Note  that,  in  general,  the 
quasi-Newton  methods  described  above  generate  dense  approximations  to  the 
Hessian,  regardless  of  the  sparsity  structure  of  the  true  Hessian.  Much  recent 
research  has  been  directed  towards  generating  Hessian  approximations  that  have 
the  same  sparsity  structure  as  the  true  Hessian  (see  Dennis  and  Schnabel,  1978; 
Marwil,  1978;  Shanno,  1980;  Toint,  1977,  1978,  1979).  This  paper  treats  various 
aspects  of  sparse  versions  of  quasi-Newton  methods  in  a  uniform  manner;  and, 
reports  results  of  extensive  numerical  testing  of  sparse  quasi-Newton  methods 
combined  with  a  modified  Cholesky  factorization  to  provide  a  positive-definite 
matrix  for  computing  the  search  direction. 

Section  2  describes  the  notation  to  be  used  in  the  rest  of  the  paper.  Sparse 
quasi-Newton  updates  are  described  in  Section  3.  Section  4  describes  positive¬ 
definiteness  as  related  to  these  quasi-Newton  methods.  Storage  and  efficiency 
are  considered  in  Section  5;  and  convergence  of  these  methods  are  described  in 
Section  6.  Finally,  Section  7  discusses  computational  aspects  of  these  methods 
and  Section  8  summarizes  the  important  results. 


2.  Definitions  and  Notation 

In  the  remainder  of  this  paper,  the  subscript  k  will  be  dropped  and  the 
subscript  k  +  1  will  be  replaced  by  the  superscript  “+*. 

Let  G(x)  denote  the  Hessian  matrix  of  /  at  x,  and  let 

N  =  {(«,  j)  :  [G(x)]y  =  0  for  all  x  6  »n>,  (2.1) 

that  is,  the  set  N  represents  the  sparsity  pattern  of  the  true  Hessian  matrix  of 
the  function  being  minimised.  Note  that  the  sparsity  pattern  is  assumed  to  be 
known  and  fixed,  and  that  any  additional  zeros  created  during  execution  of  the 


Ss- 

algorithm  are  treated  as  nonseros.  Let 
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S 


:»,/=l,2,...,»}\N 

=  {(*i  i)  '  (^(*)]<^  5^  0  for  at  least  one  c  6  R*}.  (2.2) 

For  any  symmetric  matrix  A,  define  matrices  An  and  Ajj  as  follows 


►Hl-C"’ 

for  (*t  3)  €  N, 
for  (*,/)€  ^» 

(2-3) 

for  (t,/)  €  N, 
for  (*,  f)  e  TT. 

(2.4) 

In  words,  An  is  the  matrix  A  with  seros  in  the  positions  corresponding  to 
the  nonseros  of  (7(c);  and  A-jy  is  the  matrix  A  with  seros  in  the  positions 
corresponding  to  the  seros  of  (7(c).  Then  A  can  be  written  as 


A  =  An  -f  Ajjr.  (2.5) 

Define  D,  to  be  a  diagonal  matrix  whose  diagonal  elements  are  0  or  1  depending 
on  the  sparsity  pattern  of  the  sth  row  of  (7(c),  that  is, 


if  (t,  j)  E  N, 
if  (*\;)  <EiV. 


(2.6) 


Finally,  define  a*  =  D<a  for  any  vector  s. 


8.  Sparse  Quasi-Newton  Updates 


In  the  dense  case,  it  is  possible  to  obtain  an  updated  Hessian  approximation, 
B+,  that  is  symmetric,  positive-definite,  and  satisfies  the  quasi-Newton  condition 

B+a  =  y,  (3.1) 

where  a  =  c+  —  c  and  y  =  y+  —  y.  When  generating  updates  that  satisfy  spar¬ 
sity  requirements,  it  is  not  always  possible  to  obtain  an  Hessian  approximation 
J3+  that  is  positive-definite  and  also  satisfies  the  quasi-Newton  condition  (see, 
for  example,  Shanno,  1980). 

Let  B  be  the  sparse  symmetric  n  X  n  matrix  representing  the  approximation 
to  the  Hessian  matrix  at  the  start  of  iteration  k  of  a  quasi-Newton  algorithm.  If 
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sparsity  were  the  only  issue,  sparse  updates  could  be  developed  by  setting 


B+=nB+  Ujt,  (3.2) 

where  tjB  +  U  satisfies  the  quasi-Newton  condition  (3.1)  and  Ujj  is  defined  by 
(2.4).  Another  possibility  would  be  to  obtain  the  factors  of  B+  by  updating  the 
factors  of  B  (see,  for  example,  Gill,  Golub,  Murray  and  Saunders,  1974)  while 
ignoring  fill-in.  However,  neither  of  these  approaches  would  result  in  updates 
that  satisfy  the  quasi-Newton  condition  (3.1). 

A  sparse  version  of  any  quasi-Newton  update  can  easily  be  obtained  by 
applying  Schubert’s  (1970)  technique.  For  example,  a  sparse  version  of  the  BFGS 
update  (1.5)  can  be  obtained  as 


&BFGS 


=  E 


T 

e-w  ■ . 


(3.3) 


where 


r  =  ,t(vJvY  _ 

•'  ' \‘T(V‘)  (lTB)U  ) 


(3.4) 


However.  B£rGS  is  not  symmetric.  To  ohtnin  a  symmetric  version  (see,  for 
example,  Shanno,  1980),  we  redefine  the  updated  matrix  as 


bbfgs  =  Z)  +  Wiei  I.  (3-5) 

fi'—l 

where  to,-  is  defined  by  (3.4)  and  the  X,’s  are  chosen  so  that  B%ras  satisfies  the 
quasi-Newton  condition  (3.1). 

One  of  the  disadvantages  of  this  approach  is  that  the  matrix  in  the  system  of 
equations  obtained  from  (3.5)  to  define  X,  need  not  be  positive-definite.  Another 
problem  is  that  the  line  search  used  guarantees  that  yT»  >  0  but  does  not 
necessarily  guarantee  that  (y*)T»  >  0  and  (8TB)'s  >  0  for  all  i.  In  fact,  this 
sparse  version  of  the  BFGS  update  has  proved  to  be  computationally  unstable 
due  to  (y*)r«  and/or  ($TB)it  >  0  being  sero  or  close  to  sero  (see,  for  example, 
Shanno,  1980). 

An  alternative  technique  for  obtaining  sparse  quasi-Newton  updates  is  to 
use  norm  minimisation,  since  some  dense  quasi-Newton  updates  can  be  defined 
as  least-change  updates  in  appropriate  norms  (see,  for  example,  Dennis  and 
Schnabel,  1978).  Toint  (1977)  showed  how  to  obtain  a  sparse  analog  of  the 
Powell-Symmetric-Broyden  update  (see,  for  example,  Dennis,  and  Mor6,  1977) 
by  finding  a  matrix  E  such  that  B+  —  B  +  E  is  closest  to  Bin  the  Frobenius 
norm;  has  the  sparsity  pattern  specified  by  the  set  N  (Equation  2.1);  and  satisfies 
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the  quasi-Newton  condition  (3.1).  Formally,  the  problem  can  be  stated  as 


»  n 


minimize 

(3.6) 

1 

subject  to 

Es=  y  —  Bs 

(3.7) 

Eij  =  0  (.',,•)  6  W 

(3.8) 

E—  Et. 

(3.9) 

Theorem  3.1  (Toint,  1377).  The  unique  solution  to  problem  NM1  is  given  by 


fO,  it(i,i)eN; 

V  +  if(«,i)eV; 

where  X  =  (Xi,  Xa, . . . ,  Xn)r  is  the  solution  of  the  linear  system 

Q\  =  y  —  Bs  {—  Ea) 

with  Q  defined  by 

Qij  =  («'W).  +  Il«ill3  «*,  tor  nil  ( i,j) 


(3.10) 

(3.11) 

(3.12) 


And  is  the  Kronecker  delta. 

Proof.  The  proof  involves  finding  the  stationary  point  of  the  Lagrangian  of 
problem  NM1.  For  details,  see  Toint  (1977)  or  Dennis  and  Schnabel  (1978).  | 

Obtaining  the  desired  update  by  Toint’s  method  requires  the  solution  of  a 
linear  system  of  equations  (3.11).  The  matrix  Q  in  this  system  has  the  same 
sparsity  pattern  as  the  matrix  B  and  is  positive-definite  if  and  only  if  ||s*||  >  0 
for  all  t. 

Note  that  (3.10)  and  (3.12)  can  be  rewritten  in  matrix  notation  as  follows: 


*  =  £  X,-(e.(.*)r  +  .‘efl,  (3.13) 

t»l 

e  =  E((*i)*i+ii*iiil«<]<r  (3.i4) 

i=l 

In  some  situations  it  may  be  appropriate  to  use  a  weighted  Frobenius  norm 
as  the  minimizing  function  in  problem  NM1  (see,  for  example,  Dennis  and 
Schnabel,  1978).  The  difficulty  with  using  an  arbitrary  symmetric  positive- 
definite  weighting  matrix  W  is  that  it  is  not  easy  to  obtain  a  closed  form  solution 
to  this  problem.  However,  when  W  is  a  diagonal  matrix  with  positive  diagonal 
elements,  a  closed  form  solution  for  this  problem  exists  (see,  Dennis  and  Schnabel, 
1978;  Powell,  1979;  Toint,  1977). 


Theorem  3.1  indicated  how  a  tparse  analog  of  the  Powell-Symmetric-Broyden 
update  could  be  obtained.  Shanno  (1980)  derived  a  sparse  analog  to  the  BFGS 
update  and  indicated  that  his  method  could  be  extended  to  other  symmetric 
updates.  However,  his  method  is  rather  complicated  and  not  at  all  easy  to  apply 
to  other  updates.  The  following  theorem  shows  how  sparse  analogs  to  symmetric 
updates  can  be  derived  as  a  simple  extension  of  Toint’s  (1977)  results. 

Theorem  3.3.  Let  B+  =  t jB-j-U  where  V  is  symmetric,  but,  in  general,  will  not 
have  its  sparsity  pattern  speciffed  by  N  (see  Equation  2.1);  r;  is  some  scale  factor; 
and  B  s  —  y.  Then,  the  symmetric  matrix  B+  that  is  closest  to  Bjy  in  the 
Frobenius  norm,  and  that  satisfies  the  quasi-Newton  condition  (3.1)  is  obtained 
as 


(3.15) 

where  E  is  the  solution  to  problem  NM2 

n  n 

NM2  minimise 

=  E  E 

(3.16) 

subject  to 

Es  —  &+* 

(317) 

Eij  —  o  (*,;■)  e  n 

(3.18) 

Si' 

fci 

II 

(3.19) 

Proof.  By  definition 

St  =  Us 

(3.20) 

and 

(3.21) 

From  (3.15), 


B+ s  =  (bjr E)s 

=  (B+  —  B n  -f  E)t  (by  definition  of 
=  V  —  —  E)s. 

Hence,  it  follows  that  B+  satisfies  the  quasi-Newton  condition  (3.1)  if  and  only  if 
{b^f—E)s  —  0  or  Es  =  fltfi.  This  gives  us  condition  (3.17).  Conditions  (3.18) 
and  (3.19)  follow  from  considerations  of  sparsity  and  symmetry  on  the  matrix 
E.  § 


A 
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Corollary  34.  The  solution  to  problem  NM2  is 
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_  fO, 

11  lx,«j  +  X,»i,  it  («,/)  e  Tf\ 
where  X  =  (Xj,  X2, . . . ,  Xn)r  is  the  solution  of  the  linear  system 

Q\  =  BtfS  (=  Es) 


(3.22) 


(3.23) 


with  Q  defined  by  (3.12). 

Proof.  The  proof  follows  from  Theorems  3.1  and  3.2  after  noting  that  problems 
NM1  and  NM2  differ  only  in  equations  (3.7)  and  (3.17).  | 

Analogous  to  Toint's  result  with  the  weighted  Frobenius  norm,  we  can  extend 
Theorem  3.2  to  the  weighted  Frobenius  norm  (see  Thapa,  1980). 

From  the  above  corollary  we  see  that  the  sparse  analogs  of  quasi- Newton 
updates  require  the  solution  of  a  system  of  equations  (see  Equation  3.23)  whose 

A  -f-  A 

right-hand  side  is  BNa.  Shanno  (1980)  indicates  that  the  computation  of  BNs 
does  not  require  the  storage  of  the  elements  of  Us  (see  Definition  2.1),  but  does 
require  the  computation  of  the  elements  of  Us-  However,  the  following  lemma 
shows  that  the  elements  of  Us  need  not  be  computed  either. 

Lemma  3.4. 

Bs»  =  V  —  vBs  —  UjyS.  (3.24) 


Proof. 

Bs»  =  UN»  (from  (3.20)) 

=  (U  —  Ujj)s  (by  definition  of  Us) 

=  (&+  —  ijB)s  —  UjjS  (since  &+  =  ijB  -f  U ) 

—  y  —  TjBs  —  UjfS.  | 


Toint’s  update  and  the  sparse  analogs  of  the  dense  quasi-Newton  updates 
modify  all  the  elements  of  the  Hessian  matrix  to  obtain  a  sparse  version  that 
satisfies  the  quasi-Newton  condition.  It  may  be  of  some  importance  to  consider 
obtaining  sparse  updates  that  modify  the  diagonal  in  a  predetermined  way. 
Then,  for  example,  we  could  obtain  a  sparse  approximation  to  the  BFGS  update 
that  has  the  same  diagonal  as  the  diagonal  that  would  be  obtained  if  the  dense 
BFGS  update  were  used  at  the  current  iteration  of  the  optimisation  algorithm. 
Further  details  oi  this  approach  are  given  in  Thapa,  1980. 
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4.  On  PMitivi>dtflnit«Nii 

An  implementation  of  a  quasi-Newton  algorithm  using  a  sparse  analog  of  a 
dense  update  requires  the  solution  of  two  systems  of  equations.  First,  a  system 
of  the  form  Q\  =  r  is  solved  to  obtain  the  sparse  Hessian  approximation.  Next, 
a  system  of  the  form  Bp  =  — g  is  solved  to  obtain  a  descent  direction  p. 


4.1  The  Hessian  Approximation. 

If  the  Hessian  approximation  B  is  not  positive-definite,  the  solution  of  the 
system  of  equations  Bp  =  — y  may  not  result  in  a  descent  direction.  Previous 
work  on  quasi-Newton  methods  has  treated  an  indefinite  B  with  a  trust-region 
strategy  (see,  for  example,  Dembo  et  al,  1980;  Sorensen,  1980;  Steihaug,  1980). 
The  idea  behind  these  methods  is  to  choose  a  parameter  7  such  that  B  =  B-j-'jI 
is  positive-definite  and  to  define  the  search  direction  as  the  solution  of 

(B  -I-  7 1)P  =  —9, 
so  that  p  is  a  descent  direction. 

This  paper  studies  a  different  method  for  obtaining  a  descent  direction.  We 
use  a  method  based  on  the  Cholesky  factorisation  (or  LDLT  factorization,  where 
L  is  unit  lower  triangular  and  D  is  a  positive  diagonal  matrix)  Tor  symmetric 
positive-definite  matrices.  The  j-th  column  of  the  matrix  L  is  defined  from 
columns  1  through  j  —  1  by  the  equations 

=  Bii  -  E 

•  =1 

1  3~l 
li}  =  *7 -{Bij  — 

We  use  a  numerically  stable  method  to  construct  a  matrix  B  from  a  modified 
Cholesky  factorisation  of  B.  (For  a  detailed  description  of  the  modified  Cholesky 
factorisation  seee  Gill,  Murray,  and  Wright  (1981)).  A  descent  direction  p  is  then 
obtained  as  the  solution  of  Bp  =  —9.  With  this  approach,  the  Cholesky  factors 
L  and  D  are  computed  subject  to  two  requirements:  all  elements  of  D  are  strictly 
positive,  and  the  elements  of  the  factors  satisfy  a  uniform  bound,  i.e.  for  Jb  = 
1, . . . ,  n  and  some  positive  value  p  it  holds  that 

dh>6  and  |J,*<ff|  ^  «  >  *•  (4.1) 

The  matrices  L  and  D  are  computed  by  implicitly  increasing  the  diagonal  ele¬ 
ments  of  the  original  matrix  during  the  factorisation  in  order  to  satisfy  (4.1). 


§4-1 
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When  this  process  is  completed,  the  matrices  L  and  D  are  the  Cholesky  factors 
of  a  positive-definite  matrix  B  that  satisfies 


B  =  LDLt  =  B  +  P, 

where  P  is  a  non-negative  diagonal  matrix  whose  j-th  element  is  pjy 

The  modified  Cholesky  factorization  is  a  numerically  stable  method  that 
produces  a  positive-definite  matrix  differing  from  the  original  matrix  only  in  its 
diagonal  elements.  The  diagonal  modification  is  optimal  in  the  sense  that  an  a 
priori  bound  upon  the  norm  of  P  is  minimized,  subject  to  the  requirement  that 
sufficiently  positive-definite  matrices  are  left  unaltered.  In  practice,  the  actual 
value  of  P  is  almost  always  substantially  less  than  the  a  priori  bound. 

Once  having  obtained  a  direction  of  descent  by  the  modified  Cholesky  fac¬ 
torization  we  determine  the  steplength  along  p  such  that  the  function  value  at 
the  new  point  i-f-ap  is  “sufficiently"  lower  than  the  function  value  at  the  current 
point  x.  In  order  to  obtain  such  a  decrease,  the  most  important  condition  on  the 
parameter  a  is 


I g{x  +  ap)rp|  <  —rigTpT,  for  0  <  rj  <  1, 
where  ij  is  called  the  linesearch  accuracy  parameter. 

Above  we  discussed  techniques  to  obtain  a  descent  direction  p  when  the 
Hessian  approximation  is  not  positive-definite.  The  question  then  is  whether  it 
is  possible  to  alter  the  approximation  B  to  obtain  B+  such  that  B+  is  symmetric, 
satisfies  the  quasi-Newton  condition  (3.1)  and  is  positive-definite.  Toint  (1979) 
has  shown  that  it  is  indeed  possible  to  do  so  under  certain  restrictions  on 
the  sparsity  pattern  of  B.  His  result  is  of  theoretical  importance,  but  is  not 
practically  implementable. 

Note  that  when  using  the  Levenberg-Marquardt  procedure  or  the  sparse 
modified  Cholesky  factorization  we  solve  the  modified  system  Bp  =  — g  (where 
B  =  B  +  \I  in  the  former  case  and  B  =  B  P,  where  P  is  a  diagonal  matrix, 
in  the  latter  case).  An  obvious  approach  to  obtaining  a  positive-definite  Hessian 
approximation  would  be  to  replace  B  by  B.  In  this  case  B  would  have  the  re¬ 
quired  sparsity  pattern,  be  symmetric  and  positive-definite,  but  would  not  satisfy 
the  quasi-Newton  condition  (3.1).  However,  it  would  be  hoped  that  the  next  ap¬ 
proximation  B+  obtained  from  B  would  be  “better”  than  the  approximation  B+ 
obtained  from  B.  Unfortunately,  it  turns  out  that  B+  can  be  more  indefinite 
than  B+.  To  show  this  we  first  need  some  new  definitions  and  notation.  It  should 
be  noted  that  the  technique  described  below  can  be  applied  to  any  method  if  we 
regard  B  as  the  starting  positive-definite  approximation. 

F  -  Diagonal  modification  to  make  B-\-  P  sufficiently 
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positive-definite. 

(4.2) 

B 

=  B-\-P. 

(4.3) 

B+ 

=  B  +  Ujr  +  E. 

(4.4) 

B+ 

=  B  -+•  JJjf  +  B. 

(4.5) 

X 

-  multipliers  for  obtaining  E. 

(4.6) 

X 

-  multipliers  for  obtaining  B. 

(4.7) 

u 

-  Update  obtained  using  B. 

(4.8) 

V 

-  Update  obtained  using  B. 

(49) 

hi(P) 

=  17— U. 

(4.10) 

h{P) 

—  Ai(P)  -+• 17  P. 

(4.11) 

From  (4.11)  and  (4.10), 

mi, = i*i(p)]w 

(4.12) 

and 

(413) 

Next,  note  that 

riB  +  V=tiB  +  U  +  h{P). 

(4.14) 

Now,  from  Corollary  3.3,  B+  can  be  obtained  from  B  as  follows  : 

_f°,  if  (i.j)  €  N-, 

*J  +  X(iy  +  Xyti,  if(*.j)G^; 

(4.15) 

where  X  =  (Xx,  Xa, . . . ,  Xn)T  solves 

Q\  =  ££«  (4.16) 


with  Q  defined  by  equation  (3.12)  and 

&+  =  tlB+U.  (4.17) 

Again  ,  from  Corollary  3.3,  it  follows  that  B*  can  be  obtained  from  B  as 


K-  =  {°*’+ 

(Etf  +  "f* 

if  (»,i)  e  N; 

if  (•./)  6lf; 

(4.18) 

where  X  =  (Xi,Xa, . . .  ,Xn)r  solves 

(4.19) 

with  Q  defined  by  equation  (3.12)  and 

B+  =  u B+tr. 

(4.20) 

The  next  theorem  shows  the  relationship  between  X  and  X  and  between  IT4  and 
B. 


$4.1  Tie  Hwlu  Approximation 

Theorem  4.1*  Assume  ||*‘||  >  0  for  all  i.  Let  *  solve 


ll 


Q*  =  [*(P)W‘- 

(4.21) 

Then 

(a)\  =  \  —  x 

(4.22) 

(b>  =  {b+  +  -  (*<»,  +  *,*<), 

N; 

(4.23) 

where  Q  is  defined  by  (3.12);  T?  end  N  are  defined  by  (2.1)  and  (2.2)  ;  and  the 
rest  of  the  quantities  are  defined  by  equations  (4.2)  -  (4.14). 

Proof. 


*  + 

T3ss  =  y  —  rfBs  —  V 

=  v  -  n(B  +  P)s  -  (uv  - 1 MP)]*r)  • 

—  V  —  vBs  —  Uws  -  ( r,P  +  [MP)]^)  s 
=  y  —  r)Ba  —  U-^a  —  [h{P)}wa 

=  &*,-{k(.P))wi,  (4.24) 

where  the  first  and  fifth  equalities  follow  from  Lemma  3.7,  the  second  equality 
follows  from  (4.3)  and  (4.10),  and  the  fourth  equality  follows  from  (4.11). 

From  (4.19)  we  obtain, 


Q ^  v  9 

=  B*a  -  [h(P)]jr«  from  (4.24) 

=  Q\  -  [h(P)}ws  from  (4.16). 

Since  ||«f  ||  >  0  for  all  i  ,  it  follows  from  Lemma  3.2  (b)  that  Q~l  exists.  Hence 

X  =  X  —  Q  1[h(P)]^r« 

=  X  —  z, 


which  is  (4.22). 

Next,  we  prove  (4.23).  From  (4.18)  it  follows  that  for  (s’,  j)  6  N,  —  0; 
and  for  (*,  j)  6  T? 

X;S,- 

=  bt  -H  [MP)]*i  +  (X<  -  *i)*i  +  (X,-  -  *;)*<  from  (4.14)  and  (4.22) 

=  Br.  +  [h(P)j0-  -  (ziSj  +  ZjBi).  from  (4.15) 

I 
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The  above  theorem  shows  the  effect  of  a  diagonal  modification  on  sparse 
updates.  From  equation  (4.23)  in  the  statement  of  the  theorem,  it  is  clear  that 
if  the  Xi  and  are  of  the  appropriate  sign  and  magnitude  then  it  is  possible  for 
W*"  to  be  more  indefinite  than  B+. 

The  following  example  illustrates  that  can  be  more  indefinite  than  B. 
We  shall  only  summarise  the  results.  The  full  calculations  are  given  in  Thapa 
(1980).  We  use  Toint’s  update  and  a  sparse  modified  Cholesky  factorisation  to 
compute 

Example  4.1.  Note  that  for  Toint’s  update,  h{P)  =  P,  <p  =  1,  U  =  0.  Hence, 

*  =  Q~xPt 


and 


Let 


f°' 

B+.  =  jB+  +  P«-2*ts,, 

(b  J  -  (SfSj  +  Zjti), 


for  (i,])€N 
for  t  =  1, . . . , » 
for  (*,j)  6  3- 


B  = 


/25 

5 

3 

0 


V  0 


5  3  0  0\ 

12  0  0  9 

0  0.2  0  0 

0  0  5  4 

9  0  4  1/ 


g  =  (1,  1,  20,  1,  —  2)r,  y  =  (2,  4,  3,  —1,  —  2)r,  and  the  step-length  a  —  1. 

All  the  numbers  shown  have  been  rounded  to  four  digits.  The  modified 
Cholesky  factors  of  B  are 


with 


Therefore, 


Lb  ~ 


1 

.2 

.12 

0 

0 


1 

—.05455  1 

0  0 

.8182  2.54T 


1 


.8 


\ 


> 


1/ 


Db  —  diag(25,  11,  .1927,  5,  10.81), 


Pb 

=  diag(0,  0, 

.3855, 

0,  21 

.63). 

^25.03 

6.219 

2.868 

0 

0\ 

6.219 

10.39 

0 

0 

2.722 

— 

2.868 

0 

.3693 

0 

0 

0 

0 

0 

3.140 

2.438 

\  0 

2.722 

0 

2.438 

7.356/ 

(4.25) 


$4.2  So  Mb*  QX  =  r  IS 

/25.01  5.122  2.965  0  0\ 

5.122  11.83  0  0  9.260 

ET1"  =  2.965  0  .3826  0  0  . 

0  0  0  5.061  4.052 

0  9.260  0  4.052  22.42/ 

Hence,  we  get 

IIPsIll  =  488.0, 

I  Vlg  =  .00113, 

and 

II  VHa  =  760-5' 

Therefore, 

I  VHa  >  Hp*+lla- 

Alao  ||PB+||1  <  ||PbII?  «nd  |  VIII  >  11^*111* 

The  example  described  above  was  generated  by  using  a  modified  Cholesky 
factorization.  However,  it  would  be  easy  to  generate  a  similar  example  for  trust- 
region  methods. 

4.2  Solving  Q\  —  r. 

The  matrix  Q  in  Equation  (3.12)  is  positive-definite  if  and  only  if  ||s*||  >  0 
for  t  =  1, . . . ,  n  (see  Toint,  1977).  If  any  of  the  s*  has  a  zero  norm,  then  the 
matrix  Q  is  singular  (and  positive  semi-definite).  In  this  case  the  system  Q\  =  r 
may  have  no  solution.  Toint  (1977)  suggested  setting  \j  and  r;  to  zero  for  all 
j  such  that  ||«J  ||  =  0,  and  solving  the  reduced  system  obtained  by  deleting  the 
zero  rows  and  columns  of  Q.  Another  technique  for  solving  the  system  when 
Q  is  singular  is  to  use  the  sparse  version  of  the  modified  Cholesky  factorization 
(see  Thapa,  1980).  Both  these  techniques  generate  X,’s  that  produce  Hessian 
approximations  that  do  not  necessarily  satisfy  the  quasi-Newton  condition  (3.1). 

Numerical  experience  has  shown  that  not  only  is  it  possible  for  Q  to  be 
singular,  but  that  Q  may  remain  singular  for  many  iterations.  The  following 
example  illustrates  this  fact. 

Example  4.2.  Consider  a  generalization  of  the  well  known  two-dimensional 
Rosenbrock  function  (Rosenbrock,  1960) 

/(*)  =  1  +  £(l00(*f  —  +  (1  —  *.)2)- 

t»2 

The  Hessian  matrix  for  this  function  is  tridiagonal. 

If  B0  =  I,  and  *0  =  (—1.2,  1,  —1.2,  1,  1,  1,  ...,  l)r,  then  /0  =  533.4 
and  go  —  (—215.6,  792,  —655.6,  —88,  0,  0,  . . . ,  0)r.  The  solution  of  B0Po  = 
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—go  is  po  =  —go-  With  a  linesearch  accuracy  17  =  .9,  a  is  taken  as  .000958. 
Clearly,  the  last  n  —  6  rows  and  columns  of  Q  are  all  sero.  The  updated  Hessian 
Bi  then  has  all  sero  off  diagonal  elements  in  the  last  n  —  5  columns.  On  the 
next  iteration  the  last  »  —  7  rows  and  columns  of  Q  are  all  sero.  A  total  of 
»  —  6  iterations  are  required  before  Q  becomes  positive-definite.  (However,  for 
this  particular  example,  the  system  of  equations  Q\  =  r  is  consistent  and  the 
Hessian  approximations  generated  do  indeed  satisfy  the  quasi-Newton  condition 
(3.1).) 


6.  Storage  and  Efficiency 

The  dense  quasi-Newton  updates  are  all  either  rank-one  or  rank-two  sym¬ 
metric  updates  and  most  of  them  yield  a  positive-definite  Hessian  approxima¬ 
tion  when  yT s  >  0  (an  exception  being  the  Powell-Symmetric-Broyden  update). 
Provided  that  these  low  rank  updates  yield  positive-definite  Hessian  approxima¬ 
tions,  it  is  possible  to  update  the  Cholesky  factors  of  the  Hessian  in  0(na)  opera¬ 
tions  (for  example,  see  Gill,  Golub,  Murray,  and  Saunders,  1974).  Hence,  solv¬ 
ing  Bp  =  —g  at  each  iteration  to  obtain  a  descent  direction  p  requires  0(n2) 
arithmetic  operations. 

In  the  sparse  case,  however,  the  updates  are  all  of  rank  n  and  there  does 
not  exist  any  known  method  of  efficiently  updating  the  factors  of  the  Hessian 
approximation.  This  necessitates  solving  a  new  linear  system  of  equations  to 
obtain  a  descent  direction  p  in  each  iteration  of  the  algorithm.  Furthermore, 
updating  the  Hessian  approximation  at  each  iteration  requires  the  solution  of 
an  additional  linear  system  of  equations  of  the  form  Q\  =  r  (see  Section  3) 
where  Q  has  the  same  sparsity  pattern  as  the  Hessian  matrix.  In  general,  the 
Cholesky  factorisation  requires  0(ns)  arithmetic  operations,  and,  hence,  it  would 
be  undesirable  to  solve  even  one  system  of  linear  equations  at  each  iteration. 
However,  it  is  expected  that  the  presence  of  sparsity  would  reduce  the  arithmetic 
operations  needed  to  compute  the  Cholesky  factors  from  scratch  to  0(n2)  or  less. 

In  the  dense  case  the  factors  are  updated  at  each  iteration  and  hence  we 
need  space  only  to  store  the  factors.  In  the  sparse  case,  however,  a  copy  of 
the  Hessian  approximation  as  well  as  its  Cholesky  factors  must  be  stored.  Extra 
space  is  not  required  for  Q,  since  Q  can  be  stored  at  the  end  of  the  vector  storing 
the  lower-triangular  factors  of  the  Hessian  approximation,  and  the  factors  of  Q 
can  be  overwritten  on  Q. 


8.  Convergence 

It  is  important  to  be  able  to  show  that  an  algorithm  using  the  sparse 
quasi-Newton  updates  of  Section  3  generates  a  sequence  of  points  {x*}£L0  that 
converge  to  a  point  t  such  that  g{x*)  —  0.  For  a  proof  of  one  such  result  see 
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Thapa,  1980.  Being  able  to  prove  that  an  algorithm  converges  it  not  sufficient 
to  make  it  useful  from  the  point  of  view  of  practical  applications.  It  is  usually 
desirable  to  have  a  fast  rate  of  convergence.  Toint  (1979a)  showed  that  under 
certain  conditions  his  algorithm,  which  employs  a  trust-region  procedure  to 
solve  for  the  descent  direction,  converges  superlinearly.  This  is  an  interesting 
theoretical  result;  however,  it  did  not  hold  in  many  tests  performed  by  the  author. 

Table  6.1  shows  the  last  few  iterations  of  Toint’s  sparse  quasi-Newton  method 
using  a  modified  Cholesky  factorisation  to  solve  for  the  descent  direction.  If 
Toint’s  (1979a)  theory  of  superlinear  convergence  is  to  hold  we  should  observe 
superlinear  convergence  at  this  stage  regardless  of  whether  we  use  a  trust  region 
strategy  or  a  modified  Cholesky  factorisation  to  solve  for  the  descent  direction. 
However,  superlinear  convergence  usually  is  not  observed  in  practice,  as  Table 
6.1  indicates.  The  table  is  typical  of  numerous  tests  in  which  superlinear  con¬ 
vergence  was  not  observed,  even  when  the  algorithm  was  started  close  to  the 
solution  with  the  exact  Hessian  matrix! 

A  number  of  possible  reasons  exist  to  explain  the  failure  of  superlinear 
convergence  in  practice.  One  of  the  reasons  is  that  the  sequence  of  updates 
generated  by  Toint’s  method  consistently  failed  to  remain  positive-definite  even 
close  to  the  solution  (as  shown  by  the  nonsero  entries  in  column  Pd  of  Table  6.1 
which  indicates  that  the  Hessian  matrix  was  modified).  Another  reason  is  that 
the  region  around  the  solution  in  which  Z*  converges  superlinearly  is  so  small 
that  the  limitations  of  finite  precision  make  it  impossible  to  improve  the  initial 
estimates. 
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TABLE  6.1 

Toint’s  Update  on  Teat  Function  Genrose 
Dimension  4,  linesearch  accuracy  0.9 


8* 


ft 

Ok 

fk-!m 

■isun 

Pd 

\\9kh 

46 

.011 

7.3  X  10-11 

4.4  x  10“T 

.14 

3.9  X  10“4 

47 

.947 

1.8  X  10“ 14 

2.3  X  10“T 

3.0  X  10-# 

48 

1.00 

8.6  X  10“ 15 

1.9  X  10“7 

5.7 

1.3  X  10~7 

49 

.223 

6.6  X  10“ 16 

1.1  X  10“9 

0.0 

1.3  X  10~8 

50 

.911 

8.7  X  10- 19 

1.3  X  10— 9 

0.0 

2.8  X  10“8 

51 

.775 

0.0 

9.7  X  10~10 

mim 

3.7  X  10“9 

* 

a* 

Xk 

* 

X 

r 

PD 

9k 


KEY 

-  Iteration  number. 

-  Step  length  at  iteration  k. 

-  Approximation  to  the  solution  at  iteration  k. 

-  The  solution. 

-  Function  value  at  x* . 

-  Maximum  addition  to  the  diagonal  during  factorisation.  A  nonsero 
value  indicates  an  indefinite  matrix. 

-  Gradient  at  the  point  a*. 


7.  Numerical  Results  and  Discussion 

In  this  section  we  discuss  the  numerical  performance  of  the  various  quasi- 
Newton  type  algorithms.  The  algorithms  were  tested  on  a  wide  range  of  prob¬ 
lems.  Thus,  it  is  hoped  that  the  numerical  results  will  be  valuable  in  analys¬ 
ing  the  strong  and  weak  points  of  the  various  methods,  and  determining  the 
circumstances  under  which  the  methods  are  most  successful. 

Section  7.1  discusses  criteria  for  comparing  the  numerical  performance  of 
some  of  the  algorithms  tested.  A  key  to  the  algorithms  tested  is  given  in 
Section  7.2  and  the  test  problems  used  for  the  comparison  of  these  algorithms  are 
described  in  Appendix  C.  The  last  section  discusses  the  numerical  results.  Only 
a  small  sample  of  the  algorithms  tested  are  discussed  here.  For  more  detailed 
numerical  results,  see  Thapa  (1980). 


7.1  Basis  far  Comparison 

For  the  purpose  of  comparing  algorithms  it  is  necessary  to  have  a  uniform 
standard  of  comparison  (Gill  and  Murray,  1979a),  which  will  be  called  an  assess¬ 
ment  criterion.  A  termination  criterion  is  not  a  desirable  basis  for  comparison 
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for  at  least  two  reasons.  Firstly,  there  is  no  universal  agreement  on  the  best 
termination  criterion  for  any  given  situation.  Secondly,  a  wide  variation  in  ac¬ 
curacy  of  the  solution  may  be  obtained  for  two  different  algorithms  using  the 
same  termination  criterion. 

The  assessment  criterion  used  here  is  that  suggested  by  Gill  and  Murray 
(1979a).  The  first  point  z*  is  taken  for  which 


/*  —  /*<  1*(1  +  l/*l)- 


In  all  the  tests  carried  out,  i>  was  chosen  to  be  10~ 5. 


(7.1.1) 


7.2  Key  to  the  Algorithms  Tested 

This  section  lists  the  algorithms  tested.  All  the  algorithms  described  are 

implemented  with  the  sparse  modified  Cholesky  factorization  (see  Thapa,  1980), 

the  line  search  of  Gill  et  al.  (1979b),  and  the  assessment  criterion  of  Section  7.1. 

Three  types  of  sparse  quasi-Newton  are  described  here. 

TOINT  -  The  update  described  by  Toint  (Section  2)  to  maintain  a  sparse 
approximation  to  the  Hessian  matrix  that  satisfies  the  quasi- 
Newton  condition. 

BFGS  -  The  update  suggested  by  Shanno  (Section  2)  to  modify  the  BFGS 
update  to  obtain  a  sparse  approximation  that  satisfies  the  quasi- 
Newton  condition. 

DFP  -  The  Davidon-Fletcher-Powell  update  (see,  for  example,  Dennis 

and  Mor6,  1977)  modified  (as  described  in  Theorem  3.2)  to  yield 
a  sparse  approximation  that  satisfies  the  quasi-Newton  condition. 

The  following  algorithms  were  compared  with  the  above  algorithms: 

QNM  -  A  quasi-Newton  method  using  the  full  n  X  n  BFGS  update  to 
approximate  the  Hessian  matrix. 

UBFGS  -  This  method  updates  the  sparse  Cholesky  factors  by  using  al¬ 
gorithm  Cl  described  in  the  paper  by  Gill,  Golub,  Murray  and 
Saunders  (1974)  and  ignoring  all  fill-in  (in  the  factors)  when  the 
dense  BFGS  update  is  used  to  approximate  the  Hessian  matrix. 

DFD  -  Direct  method  for  finite  differencing  (See  Powell  and  Toint,  1979) 

with  the  sparse  modified  Cholesky  factorisation. 
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7.3  Test  Problems 

The  generation  of  an  adequate  set  of  test  problems  to  compare  a  set  of 
algorithms  is  not  an  easy  task.  It  is  important  that  a  large  carefully-selected 
set  of  problems  should  be  used  to  test  the  algorithms.  The  set  of  problems 
should  be  sufficiently  diverse  so  that  one  or  more  of  the  algorithms  do  not  exploit 
peculiarities  common  to  the  set  by  adjusting  certain  free  parameters  in  the 
algorithms.  Furthermore,  the  set  of  problems  should  be  chosen  to  appropriately 
test  the  algorithms  under  consideration.  For  example,  it  is  pointless  to  test 
sparsity  exploiting  algorithms  exclusively  on  problems  of  small  dimension.  On 
the  other  hand,  testing  large  problems  can  become  prohibitively  expensive  in 
terms  of  CPU  time.  A  list  of  the  test  problems  and  the  starting  points  used  by 
the  algorithms  on  these  problems  can  be  found  in  Appendix  C. 

7.4  Discussion  of  Numerical  Results 

All  the  algorithms  are  coded  in  double  precision  Fortran  IV.  The  runs  were 
made  on  a  DEC-20  System,  for  which  the  machine  precision,  e,  is  approximately 
10-18,  and  the  largest  number  representable  is  approximately  1088. 

A  total  of  27  problems  were  solved.  Each  problem  was  solved  using  the 
values  0.9,  0.1  and  0.001  for  17,  the  accuracy  of  the  line  search  (see  Section  4.1). 
Many  different  algorithms  (see  Thapa,  1980),  including  those  described  in  Section 
7.2,  were  tested.  Each  algorithm  requires  two  parameters  in  addition  to  the  line 
search  accuracy:  Smam,  a  bound  upon  the  change  ||z*+i  —  z*||  at  each  iteration, 
and  ftit,  an  estimate  of  the  value  of  the  objective  function  at  the  solution.  In  all 
the  cases  the  value  of  5mu(see  Section  7.4.3)  was  set  to  10*  (essentially  implying 
an  upper  bound  of  infinity),  and  the  value  of  /«,<  was  set  to  the  value  of  /(z)  at 
the  solution.  The  algorithms  also  require  an  estimate  of  the  space  required  by 
the  nonseros  of  the  Cholesky  factors  of  the  Hessian  matrix. 

The  results  of  all  the  tests  are  displayed  in  Appendices  A  and  B.  Appendix 
A  contains  tables  of  storage  requirements  of  the  Hessian  matrix  and  its  Cholesky 
factors;  and  execution  times  of  the  various  routines.  Appendix  B  contains  the 
results  of  testing  the  algorithms  described  in  Section  7.2. 

7.4.1  Storage  Required  for  the  Hessian  Matrix  and  Us  factors. 

Table  A1  in  Appendix  A  is  a  comparison  of  the  space  required  in  double 
word  lengths  by  the  Hessian  matrix  and  its  Cholesky  factors  for  the  dense 
quasi-Newton  method,  one  finite-difference  scheme  and  the  sparse  quasi-Newton 
methods.  For  details  on  the  computation  of  these  numbers,  see  Thapa  (1980). 
It  is  interesting  to  see  that  for  a  function  like  the  7Diagonal,  the  space  required 
by  the  sparsity-exploiting  methods  is  not  much  less  than  the  space  required  by 
a  dense  quasi-Newton  method.  This  is  because  there  is  considerable  fill-in  in  the 


Time  Requirement* 


10 


§7.4.2 


Cholesky  factors  of  the  Hessian  matrix.  For  situations  such  as  these,  where  the 
factors  fill  in  considerably,  it  is  necessary  to  obtain  a  symmetric  permutation  of 
the  matrix  that  would  reduce  the  fill-in.  However,  even  this  may  not  necessarily 
resolve  the  difficulty  (as  is  the  case  for  the  7 Diagonal  function).  For  large 
problems  with  considerable  fill-in,  the  modified-Newton  algorithm  utilising  a 
finite-difference  scheme  to  obtain  the  Hessian  matrix  can  be  used  by  rejecting  all 
fill-in  in  the  factors,  or  by  rejecting  some  fill-in  by  utilizing  some  sort  of  a  partial 
factorization  scheme  (see  Thapa,  1980).  It  is  interesting  to  compare  the  space 
requirements  for  the  calculus  of  variation  problems  ranging  in  dimension  from 
10  to  1000.  These  problems  have  a  block-tridiagonal  structure.  For  n  =  10,  the 
sparse  quasi-Newton  methods  require  more  space  than  the  dense  quasi-Newton 
methods.  As  n  grows  large,  the  space  required  by  the  dense  quasi-Newton  method 
grows  rapidly  and  it  becomes  impractical  to  implement  the  method  from  the 
point  of  view  of  the  storage.  The  space  required  by  the  sparse  quasi-Newton 
methods  grows  much  faster  than  for  the  finite-difference  method.  For  example, 
when  n  =  1000  the  space  required  to  store  the  approximation  to  the  Hessian 
matrix  and  its  LDLr  factors,  is  greater  for  sparse  quasi-Newton  methods,  than 
for  a  modified-Newton  method  using  the  direct  method  (Algorithm  DFD)  for 
finite-differencing,  by  2745  double  word  lengths.  Thus,  the  maximum  size  of 
the  problem  that  can  be  solved  by  the  sparse  quasi-Newton  method  is  smaller 
than  the  maximum  size  of  the  problem  that  can  be  solved  by  a  modified-Newton 
algorithm  utilizing  a  finite-difference  approximation  scheme. 


7.4.2  Time  Required  for  Different  Tasks. 

Table  A2  in  Appendix  A  compares  the  CPU  time  in  seconds  for  the  various 
tasks  performed  in  each  of  the  algorithms.  All  the  numbers  in  the  table  were 
obtained  by  one  computer  run  when  the  system  was  free  of  any  other  jobs.  Each 
task  (excepting  GENPAT)  was  executed  50  times  and  the  average  time  is  reported 
in  Table  A2.  Even  so,  these  numbers  are  not  very  accurate  and  are  merely 
meant  to  compare  the  time  spent  in  executing  the  various  tasks  for  different 
functions.  Note  that  the  time  required  to  obtain  a  finite-difference  approximation 
to  the  Hessian  matrix  includes  the  time  spent  in  evaluating  the  gradient  vectors 
for  the  different  groups.  The  tasks  of  generating  the  pattern  of  the  Hessian 
matrix  (GENPAT),  of  forming  groups  for  finite-differencing  (GROUP),  and  of 
analyzing  the  Hessian  matrix  to  determine  the  fill-in  in  the  Cholesky  factors 
(ANALYZ)  need  be  done  only  once  for  a  given  function.  Once  these  tasks  have 
been  completed,  the  appropriate  information  can  be  read  in.  It  is  especially 
advantageous  to  read  in  the  pattern  of  the  Hessian  matrix,  as  the  results  on 
Calvar2  for  n  =  500  show  (Table  A2).  It  is  interesting  to  note  that  the  time 
spent  in  obtaining  the  Cholesky  factors  (FACTOR)  and  in  solving  a  system  of 
equations  (SOLVE)  is  usually  small  (except  for  the  function  7Diagonal  where 
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there  is  considerable  fill-in  and  the  time  for  FACTOR  is  large).  For  the  chained 
Rosenbrock  function  (ChaRose)  the  time  to  eyaluate  the  function  and  gradient  at 
any  point  is  smaller  than  the  time  spent  in  FACTOR  and  SOLVE.  For  functions 
of  this  type  the  standard  measure  of  function  evaluations  only  is  not  suitable; 
and,  thus  it  would  be  more  useful  to  compare  the  various  algorithms  on  the  basis 
of  numbers  of  iterations  in  addition  to  the  number  of  function  evaluations.  In 
fact,  for  the  functions  ChaRose  and  QOR  the  time  required  to  obtain  a  sparse 
quasi- Newton  update  is  more  than  the  time  required  to  obtain  a  finite-difference 
approximation  to  the  Hessian  matrix.  A  comparison  of  the  times  to  FACTOR 
and  SOLVE  for  the  function  Calvar2  for  n=100  to  500  shows  the  interesting 
fact  that  the  CPU  time  increases  linearly  with  n. 


7.4.3  Influence  of  Stepmx. 

The  numerical  results  obtained  in  our  implementation  of  Toint’s  update 
differ  from  the  results  shown  in  Toint’s  (1978)  paper.  One  of  the  reasons  for  this 
is  that  Toint  (1978)  uses  a  “trust-region  method”  whereas  we  use  a  “step-length 
method” . 

In  both  these  methods  the  ||z*+i  —  z*||  can  be  bounded  by  a  scalar.  In  a 
step-length  algorithm  a  uniform  bound,  5max,  is  used  for  all  iterations;  whereas, 
in  a  trust-region  method,  a  scalar  Ak  (the  sise  of  the  trust-region)  is  adjusted  at 
each  iteration.  By  choosing  different  initial  estimates  of  A0  it  is  possible  to  obtain 
better  results,  as  is  the  case  in  the  results  shown  by  Toint  (1978),  where  A0  is 
varied  considerably  (in  one  case  a  different  value  for  Aq  is  used  when  comparing 
different  algorithms  on  the  same  function).  It  is  possible  to  duplicate,  within 
a  few  function  evaluations,  Toint’s  result  by  choosing  an  appriopriate  value  for 
Smas  in  the  linesearch.  However,  for  uniformity  of  testing,  Smax  "was  set  to  105 
(essentially  implying  an  upper  bound  of  infinity)  on  all  the  functions  tested.  A 
large  upper  bound  was  chosen  to  avoid  biasing  any  of  the  results  —  a  smaller 
bound  could  possibly  influence  the  performance  of  some  of  the  algorithms  on 
certain  functions. 


7.4.4  Comparison  of  the  Algorithms. 

The  function  evaluations  and  iterations  quoted  in  Table  B  (Appendix  B)  are 
all  computed  using  the  assessment  criterion  of  Section  7.1  with  a  tolerance  of 
10~ s.  All  the  algorithms  were  run  to  a  maximum  of  2000  function  evaluations. 
The  algorithms  were  all  still  reducing  the  function  (albeit  slowly)  when  they 
were  stopped  at  2000  function  evaluations.  In  some  cases  some  of  the  algorithms 
failed.  By  this  it  is  meant  that  the  line-search  routine  could  not  find  a  lower  point. 
This  was  usually  due  to  the  fact  that  the  search  direction  p  was  arbitrarily  close 
to  sero  or  almost  orthogonal  to  g.  The  exception  was  the  algorithm  UBFGS, 
where  the  failure  was  due  to  p  not  being  a  descent  direction,  since  the  updated 


i 


ST.4.4 


Compurtoon  of  Algorithm 


21 


factors  had  lost  their  positive-definiteness.  In  all  the  cases  examined,  the  matrix 
Q  (see  Equation  3.12)  had  been  singular  for  many  iterations  before  failure  of  the 
algorithm,  which  meant  that  the  sparse  quasi- Newton  updates  did  not  satisfy 
the  quasi-Newton  condition. 

Table  B  compares  the  function  evaluations  and  iterations  required  by  three 
sparse  quasi-Newton  updates  (TOINT,  BFGS,  DFP).  The  performance  of  these 
algorithms  is  poor  in  general  —  especially  on  the  large  problems.  On  a  few 
functions,  these  methods  do  slightly  better  than  the  finite-difference  algorithm. 
However,  their  performance  is  much  worse  in  terms  of  number  of  iterations 
in  most  of  the  cases  (with  the  exception  of  the  Trigonometric  function).  An 
interesting  result  holds  for  the  quadratic  function  (which  is  a  diagonal  function). 
At  each  iteration  the  new  Hessian  approximation  is  in  fact  obtained  as  a  finite 
difference  approximation  with  the  finite  difference  interval  being  max{|«i|,0}  for 
the  tth  diagonal  element,  where  6  is  given  by 

6  =  max{e||B||,e>, 

where  e  is  the  machine  precision.  That  is, 


"  max{|*<|,5>' 

Besides  these  three  algorithms,  a  sparse  version  of  the  self-scaled  BFGS 
update  was  also  tested  (results  not  shown  here).  The  best  of  these  four  methods 
is  Toint’s  update,  which  is  a  little  surprising  considering  that  the  dense  BFGS 
update  has  been  found  to  be  superior  to  the  other  dense  quasi-Newton  methods 
in  practice.  However,  Toint’s  update  is  by  no  means  competitive  when  compared 
with  a  modified-Newton  algorithm  using  a  finite-difference  scheme  to  generate 
approximations  to  the  Hessian  matrix.  In  fact,  in  spite  of  a  proof  of  superlinear 
convergence  (Toint,  1979a),  superlinear  convergence  was  not  observed  in  the  test 
problems  within  the  current  implementation  of  Toint’s  update  (for  a  discussion, 
see  Section  6). 

On  moderate  size  problems  (n  =  50  to  n  =  100),  the  dense  quasi-Newton 
method  performs  significantly  better  than  the  sparse  quasi-Newton  methods. 
Thus,  there  does  not  seem  to  be  much  truth  in  the  speculation  that  supplying 
more  information  (in  the  form  of  sparsity)  to  quasi-Newton  methods  should  cause 
them  to  converge  faster.  An  interesting  variation  is  the  method  UBFGS  which 
uses  the  Cl  algorithm  (Gill,  Golub,  Murray  and  Saunders,  1974)  to  update  the 
factors  of  the  BFGS  update  ignoring  all  fill-in  in  the  factors.  When  it  does 
converge  its  performance  is  superior  to  that  of  the  sparse  quasi-Newton  methods. 
As  noted  previously,  its  failure  is  due  to  the  loss  of  positive  definiteness  of 
the  product  of  the  Cholesky  factors.  It  is  remarkable  that  this  method  does 
better  than  the  sparse  quasi-Newton  methods  since  the  updates  obtained  by  the 
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UBFGS  method  do  not  satisfy  the  quasi-Newton  condition  (3.1).  However,  these 
good  results  with  UBFGS  should  be  viewed  with  some  caution  since  the  method 
performed  well  mostly  on  functions  with  diagonally  dominant  Hessian  matrices. 

The  table  clearly  shows  the  superiority  of  the  modified- Newton  method  over 
the  others.  In  most  of  the  cases  the  finite-difference  method  does  better  than 
the  others  in  terms  of  function  evaluations.  It  does  better  than  all  the  others  in 
terms  of  iterations,  as  Table  B  shows. 

9.  Conclusions 

All  the  algorithms  were  tested  using  a  modified  Cholesky  factorisation.  The 
overall  conclusions  reached  are  that  sparse  quasi-Newton  methods  perform  poorly 
in  general.  Superlinear  convergence  was  not  observed  and  the  quasi-Newton 
updates  consistently  lost  the  property  of  positive-definiteness.  Furthermore, 
they  require  more  storage  than  modified- Newton  methods  that  utilize  a  finite- 
difference  scheme  that  exploits  sparsity  in  the  Hessian  matrix;  and  they  may 
require  a  significant  amount  of  time  to  perform  the  linear  algebra  needed  to 
obtain  a  sparse  quasi-Newton  update.  Modified- Newton  methods  utilizing  a 
finite-difference  scheme  that  exploit  sparsity  and  symmetry  in  the  Hessian  matrix 
perform  extremely  well.  These  Newton-type  methods  perform  very  well  even 
when  all  fill-in  is  ignored  in  the  modified  Cholesky  factors  (see  Thapa,  1980). 

A  crude  implementation  of  preconditioned  conjugate-gradient  methods  (see 
Thapa,  1980)  that  utilize  sparse  quasi-Newton  updates  was  seen  to  perform  well 
in  comparison  to  sparse  quasi-Newton  methods.  It  is  expected  that  a  refined 
implementation  of  such  methods  may  prove  to  be  very  successful.  However, 
much  work  remains  to  be  done  on  such  methods. 

Surprisingly,  UBFGS,  a  crude  implementation  of  a  method  that  updated  the 
Cholesky  factors  using  algorithm  Cl  of  Gill,  et  al.  (1974)  but  ignoring  all  fill-in  in 
the  factors,  performed  well  on  the  set  of  problems  on  which  the  updated  factors 
remained  positive  definite.  However,  it  should  be  noted  that  UBFGS  performed 
well  mostly  on  functions  that  were  diagonally  dominant.  It  would  be  interesting 
to  develop  this  method  further. 
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TABLE  A1 

Space  Required  in  Double  Word  Lengths  by 
the  Hessian  Matrix  end  iti  Cholesky  factor* 
in  a  Deme  Quad-Newton  Method  compared  to 
a  Modifled-Newton  Method  using  Finite-Difference* 
and  to  Sparse  Quaii-Newton  Methods 
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DFD 
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Quadratic 

n  =25 

325 

57.75 

65 

GenRose 

n  =  25 

325 

02.25 

122.5 

QOR 

n  =  50 

1275 

627.5 

755 

C1GOR 

n  =  50 

1275 

644.5 

820 

7Diagonal 

n  =  60 

1830 

1368 

1530 

CldiaT 

n  =  60 

1830 

1301.25 

1581.25 

Cahrarl 

n  =  10 

55 

52.5 

75 

Cairarl 

»  =20 

210 

112.5 

162.5 

Caharl 

n  =  30 
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250 

Cahrarl 

»  =  50 

1275 
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425 

Cahrarl 

n  =  100 

5050 

502.5 

862.5 

Calsarl 

n  =  200 

20100 

1102.5 

1737.5 

Cahrarl 

n  =  300 

45150 

1702.5 

2612.5 

Cahrarl 

n  =  400 

80200 

2302.5 

3487.5 

Cahrarl 

n  =  500 

125250 

2002.5 

4362.5 

Caivarl 

n  =  1000 

500500 

5002.5 

8737.5 

KEY 

Qtru  -  Dense  quasi-Newton  method. 

osd  -  Direct  method  for  finite-differencing. 

kdso  -  New  Direct  method  for  finite-differencing. 

sp«n  -  Sparse  quasi-Newton  update. 


TABLE  A3 

CPU  ncondt  Required  for  Various  Tuks  Performed  by 
the  Unconstrained  Optimisation  Algorithm* 
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FACTOR 
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TOIMT 
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KEY 

-  Generate  the  pattern  of  the  Hessian  matrix. 

-  Form  the  groups  for  the  Direct  Method  for  finite- differencing. 

-  Determine  the  pattern  of  the  LDLT  factors. 

-  Obtain  the  LDLT  factors. 

-  Sohre  an  n  x  n  system  of  equations. 

-  Compute  the  function  value  and  the  gradient  vector. 

-  Direct  Method  for  finite-differencing. 

-  Toint’s  sparse  quasi- Newton  update. 

-  Sparse  version  of  the  sroe  update. 

-  Sparse  version  of  the  dtt  update. 
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TABLE  B 

Number  of  Function  Evaluations  and 
IttrAtions  Required 
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114,56 

36,10 

22,3 

37,16 

n=  10 

n  =  .ooi 

73,31 

46,20 

04,42 

46,20 

26,3 

42,16 

CahrarS 

V  —  -9 

152,05 

• 

200,130 

54,45 

20,4 

57,48 

Start  1 

q  =  .l 

128,62 

107,54 

240,118 

70,35 

24,3 

67,30 

n  =  20 

ri  =  .001 

151,61 

* 

147,64 

83,36 

27,3 

77,30 

GenRose 

*7=0 

48,25 

40,20 

88,55 

134,63 

61,13 

107,43 

Start  S 

»>  =  .l 

71,26 

130,64 

76,34 

163,64 

65,11 

130,40 

n  =  25 

T)  =  .001 

104,28 

01,27 

104,37 

178,51 

76,11 

157,44 

ChaRose 

*7=0 

61,28 

44,21 

40,23 

68,36 

62,13 

07,48 

Start  5 

V  =  -1 

68,26 

45,10 

45,17 

63,28 

45,8 

116,44 

n  =  10 

v  =  .001 

02,23 

76,20 

70,21 

01,20 

87,12 

165,47 

TOIMT 

eras 

DBF 

varoi 

DFD 

QNW 

«.yy 

* 

F 

NR 


KEY 

-  Toint’  sparse  quasi- Newton  update. 

-  Shanso’s  sparse  seas  update. 

-  Sparse  Torsion  of  the  dtp  update. 

-  Updating  the  Cholesky  factors  of  the  btos  update  ignoring  fill-in. 

-  Direct  method  for  finite-differencing. 

-  Dense  quasi-Newton  method. 

-  Number  of  function  evaluations,  Number  of  iterations. 

-  Exceeded  2000  function  evaluations. 

-  Failed  to  converge. 

-  Not  Run. 


TABLE  B  (continued) 
Number  of  Function  Enina  tionf  and 
iteration*  Required 


KEY 

toikt  -  To  inf  apane  quasi-Nswton  update, 
arai  -  Shanno’s  apane  irae  update, 
oar  -  Sparse  version  of  the  oar  update. 

onaos  -  Updating  the  Cholesky  factor*  of  the  naas  update  Ignoring  AD- in. 
nan  -  Direct  method  for  finite-differencing. 

«m  -  Dense  quasi-Newton  method. 

xxjrjr  -  Number  of  fraction  evaluations,  Number  of  iterations. 

*  -  Exceeded  2000  function  evaluations. 

F  -  Failed  to  converge. 

NR  -  Not  Run. 
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TABLE  B  (continued) 
Number  of  Function  Evaluations  and 
iteration#  Required 


PltOBLB  M 

TOINT 

BSOS 

DVP 

vara  s 

DVD 

emi 

Cahrarl 

n  =  •» 

• 

* 

1913,858 

220,157 

38,5 

191,162 

Start  1 

V  =  .1 

* 

* 

1502,606 

203,89 

39,5 

299,149 

n  —  50 

n  =  .001 

* 

* 

* 

395,153 

48,5 

258,88 

Calvar2 

»?  =  .0 

739,391 

* 

1682,872 

107,59 

15,2 

53,28 

Start  1 

V  =  .1 

1052,460 

* 

1967,963 

111,54 

15,2 

54,28 

n  =  50 

f]  =  .001 

1119,440 

* 

* 

119,52 

1T.2 

57,28 

CalvarS 

fj  =  .9 

411,242 

* 

1138,627 

128,93 

36,5 

114,90 

Start  1 

V  =  .1 

556,254 

* 

871,406 

169,84 

31,4 

131,59 

n  =  50 

T)  =  .001 

898,350 

* 

1365,568 

203,85 

36,4 

154,59 

Calvarl 

V  =  .8 

* 

* 

* 

478,322 

45,6 

372,298 

Start  1 

V  =  .1 

* 

* 

* 

662,323 

49,6 

384,166 

n  =  100 

v  =  .001 

* 

* 

* 

812,316 

50,5 

476,164 

Calvarfi 

q  =  .9 

* 

* 

* 

199,100 

15,2 

Start  1 

r?  =  -1 

* 

* 

* 

202,100 

15,2 

»  =  100 

V  =  .001 

* 

* 

* 

218,97 

17,2 

CalvarS 

V  —  .9 

1383,757 

* 

* 

243,178 

36,5 

187,138 

Start  1 

*7  =  -1 

1754,721 

* 

* 

306,158 

31,4 

206,87 

»  =  100 

T)  =  .001 

* 

* 

* 

361,154 

36,4 

258,97 

Calvar2 

V  =  .9 

F 

NR 

• 

390,216 

8,1 

NR 

Start  2 

*7  =  .1 

» 

NR 

• 

468,235 

8,1 

NR 

n  =  200 

V  =  .001 

* 

NR 

* 

464,212 

9,1 

NR 

CalvarS 

n  =  .9 

* 

NR 

* 

893,533 

113,14 

NR 

Start  4 

v  =  .i 

• 

NR 

* 

1092,524 

135,15 

NR 

n  =  200 

v  =  .001 

* 

NR 

* 

1164,512 

140,14 

NR 

TOINT 

■ro« 

orr 

uirai 

oro 

ami 

M.yy 

* 

F 

NR 


KEY 

-  Toint’  sparse  quasi-Newton  update. 

-  Shanno’s  sparse  Bros  update. 

-  Sparse  version  of  the  d rr  update. 

-  Updating  the  Cholesky  factors  of  the  eras  update  ignoring  fill-in. 

-  Direct  method  for  finite-differencing. 

-  Dense  quasi-Newton  method. 

-  Number  of  function  evaluations,  Number  of  iterations. 

-  Exceeded  2000  function  evaluations. 

-  Failed  to  converge. 

-  Not  Run. 


APPENDIX  C 


C.l  Test  Problems 

For  the  purpose  of  comparing  the  algorithms  described  in  Section  7.2,  test 
problems  ranging  in  siie  from  dimension  10  to  dimension  500  were  used.  The 
problems  tested  are  described  below.  The  following  notation  will  be  used  in  the 
rest  of  the  paper. 

17 g  -  Number  of  nonseros  below  the  diagonal  of  the  Hessian  matrix. 

17 l,  -  Number  of  nonzeros  below  the  unit  diagonal  of  the  lower-triangular 

C  hole  sky  factor. 

Nqd  -  Number  of  groups  used  to  obtain  the  finite  difference  approximation 
to  the  Hessian  by  the  Powell- Toint  direct  method  (Algorithm  D  in 
Thapa,  1980). 

Test  Function  1.  GenRose  (Gill  and  Murray,  1979b) 

This  is  a  generalization  of  the  well  known  two-dimensional  Rosenbrock 
function  (Rosenbrock,  1960) 

m = i + E(i°°(*  -  + 11  -  *<)')• 

Um  2 

The  Hessian  matrix  is  tridiagonal  and 

17 g  —  n  —  1,  17 l  =  n  —  1, 

Ngd  —  3. 

Test  Function  2.  ChaRose  (Toint,  1978) 

25 

/(at)  =  1  +  23  —  *?)*  +  (!  —  *<)*)» 

i—2 

This  is  a  modification  and  generalization  of  the  well  known  two-dimensional 
Rosenbrock  function  (Rosenbrock,  1960)  where  the  constants  a,  are  given  by 
Toint  (1978).  The  Hessian  matrix  is  tridiagonal  and 

17a  —  24,  17i,  —  24, 

Ngd  —  3. 

Test  Function  3.  QOR  (Toint,  1978) 

60  33  /  \2 

/(*)  =  +  H  ^  *j  )» 

<-l  <- 1  V  /6A(0  ' 

where  the  constants  a,-,  d,  and  the  sets  A(t),  B(i)  are  described  by  Toint 
(1978).  The  function  is  convex  with  a  sparse  Hessian  matrix  and 

17 a  =  115,  17 L  =  389, 

Nqd  —  8- 
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Test  Function  4.  GOR  (Tout,  1978) 

so  ss 

/(*)  =  ]C  +  5Z 

t— 1  i=  1 

where 

,  ,(m.\  —  J  a<  *<  to&O-  +  *»),  *<  >  0, 

t{  %)  a,*,log,(l  +  *f),  Xi<  0, 

Vi  —  di—  53  *J  +  53  *i 
i€A(t)  J€B(<) 

and 

x  _  ioge(i  +  y*),  y<  >  o» 
t(y,)  W?,  Vi  <  o. 

The  constants  a,,  d,  and  the  sets  A(i),  B(i)  are  the  same  as  for  QOR.  The 
function  is  convex  but  has  discontinuous  second  derivations.  The  Hessian  matrix 
has  the  same  sparsity  pattern  as  the  Hessian  matrix  for  QOR.  The  quantities 
7Jg,  Wl,  and  NGD  are  the  same  as  for  QOR. 

Test  Function  5.  PSP  (Toint,  1978) 


/(*)  =  53  a<(Xi  ~  5)2  +  53  PMVi), 


where 

Vi  =  di  ~  53  *,+  53  Xj, 

i€A(0  j€B(t) 

h(v)  =  V'lV*' 

<KVxi  \100(0.1  -  Vi)  +  10,  Vi  <  0.1, 

The  constants  a,-,  d<  and  the  sets  A(i),  B(t)  are  the  same  as  for  QOR.  The 
Hessian  matrix  has  the  same  sparsity  pattern  as  the  Hessian  matrix  for  QOR. 
The  quantities  77G,  and  Nqd  are  the  same  as  for  QOR. 

Test  Function  6.  Quadratic  (Gill  and  Murray) 


where  p  is  a  constant.  The  function  is  convex  with  a  diagonal  Hessian  matrix 
that  is  ill-conditioned  and 

tfo  =  0,  =  0, 

Nod  —  1. 


75Tl  =  0, 


30 

The  next  three  functions  are  similar  to  those  that  arise  in  the  numerical  solution 
of  optimal  control  problems.  The  general  continuous  problem  is  of  the  form 

min  J(z{t))  =  f  /(t,  z(t),  **(*))  dt, 

subject  to  the  boundary  conditions  z(0)  =  a,  x(  1)  =  6.  These  problems  are 
known  as  calculus  of  variations  problems.  A  numerical  procedure  to  solve  these 
problems  is  to  discretize  them.  The  first  three  functions  described  below  are 
discretized  by  expressing  x(t)  as  a  linear  sum  of  functions  that  span  the  space  of 
piecewise  cubic  polynomials.  This  gives  rise  to  a  function  with  a  block  triangular 
Hessian  matrix. 

Test  Function  7.  Calvar  (Gill  and  Murray,  1973) 

J(*(t))  =  Jq  j*(t)2  +  *'(t)  tan-1  z'(t)  —  log(l  +  z'(t)2)*j  dt, 

with  the  boundary  conditions  z(0)  =  1,  z(l)  =  2.  The  Hessian  matrix  is  block 
tridiagonal  and 

T?g  —  — 5-f*2.5n,  — 

Nqd  —  6. 

Test  Function  8.  Calvar2  (Gill  and  Murray,  1973) 

■>(m) = l  {ioo(*w  -  *>(«)»)’ + (i  -  *•(«))’}  dt, 

with  the  boundary  conditions  z(0)  =  z(l)  =  0.  The  Hessian  matrix  is  block 
tridiagonal  and 

tfo  =  — 5  -+*  2.5n,  T7l  =  TTq, 

Ngd  =  6. 

Test  Function  9.  Calvar3  (Gill  and  Murray,  1973) 

with  the  boundary  conditions  z(0)  =  1,  z(l)  =  0.  The  Hessian  matrix  in  block 
tridiagonal  and 


JTa  =  —  5  +  2.5n, 
Nod  —  6. 


~Rl  —  "R Qt 


SI 


Test  Function  10.  7-Diagonal  (Toint,  1978) 

This  is  a  modified  version  of  the  function  described  in  Toint ’s  paper. 

Let 


then 


where 

Vi  =  — ^)*i  +  2*a  “ 

Vi  ~  *»— 1  (3  "2")**  ^*»-M  1  i  —  2, . . .  ,n, 

Vto  =  *5«  —  (3  —  0  —  1. 

The  Hessian  matrix  has  the  pattern  shown  in  Toint(1978)  and 

ff<3  =  147,  Nt  =  957, 

Nod  =  8. 

Test  Function  11.  Trigonometric  (Toint,  1978) 

This  is  a  modification  of  the  function  suggested  by  Toint.  The  modification 
guarantees  that  the  same  minimum  is  found  by  all  the  algorithms  if  the  same 
starting  point  is  used. 

Choose  a  set  of  pairs  of  indices  J  =  {(*,  3)  \  1  <  *  <  n,  and  1  <  3  <  t’} 
Let 

/(*)=  J2  sin^iij  +  PjTi  +  c#]; 


/(*)  =  53  +  *<+*olt; 

*™1  i»l 

60 

/(*)  =  /(*) + 53(*< — 5)2, 


/(*)  =  /(*)  +  £(*.■  “  5)a> 


where 


<*ij  —  S[1  +  mod(s,  5)  +  mod  (j,  5)], 

,  _(»+ i) 

ioT' 

The  Hessian  matrix  has  its  pattern  defined  by  the  set  J  and 

TTq  =  268,  T7l  =  1212, 
N<3D  S=5  9. 


S3 


Teat  Function  12.  C1GOR 

This  if  a  combination  of  the  Calvarl  (Test  Function  7)  and  GOR  (Test 
Function  4)  with  n  =  50.  That  is, 


/(*)  =  /GOR  +  /Calvarl* 

The  Hessian  matrix  has  a  pattern  that  is  a  combination  of  the  patterns  of  GOR 
and  the  1st  calculus  of  variations  and 


7 7a  =  163,  J7l  =  393, 

Ncd  —  9. 


Teat  Function  13.  C1DIA7 

This  is  a  combination  of  Calvarl  (Test  function  7)  and  the  7-diagonal  func¬ 
tion  (Test  function  11)  for  n  =  60.  That  is, 


/(*)  —  /Calvarl  ^-Diagonal* 


The  Hessian  matrix  has  a  pattern  that  is  a  combination  of  the  patterns  of  the 
1st  calculus  of  variations  and  the  7  diagonal  function;  and 


^<?  =  175,  7Tl  =  970, 

Ngd  —  9. 


C.1  Stsrting  Points 


The  starting  points  used  are  as  follows. 
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SOL  81-12:  OPTIMIZATION  OF  UNCONSTRAINED  FUNCTIONS  WITH  SPARSE 

HESSIAN  MATRICES  —  QUASI-NEWTON  METHODS;  by  M.N.  Thapa 

Newton-type  methods  and  quasi -Newton  methods  have  proven  to  be  very 
successful  in  solving  dense  unconstrained  optimization  problems. 

Recently  there  has  been  considerable  interest  in  extending  these 

methods  to  solving  large  problems  when  the  Hessian  matrix  has  a  known 
a  priori  sparsity  pattern.  This  paper  treats  sparse  quasi -Newton 
methods  in  a  uniform  fashion  and  shows  the  effect  of  loss  of 

positive-definiteness  in  generating  updates.  These  sparse 

quasi-Newton  methods  coupled  with  a  modified  Cholesky  factorization  to 
take  into  account  the  loss  of  positive-definiteness  when  solving  the 
linear  systems  associated  with  these  methods  were  tested  on  a  large 

set  of  problems.  The  overall  conclusions  are  that  these  methods 
perform  poorly  in  general  —  the  Hessian  matrix  becomes  indefinite 

even  close  to  the  solution  and  superlinear  convergence  is  not  observed 
in  practice. 
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